# Full text of "MixEst: An Estimation Toolbox for Mixture Models"

## See other formats

```arXiv: 1507.06065vl [stat.ML] 22Jul2015

Journal of Machine Learning Research 1 (2015) 1-5

Submitted 4/00; Published 10/00

MixEst: An Estimation Toolbox for Mixture Models

School of ECE, College of Engineering
University of Tehran
Tehran, Iran

School of ECE, College of Engineering
University of Tehran
Tehran, Iran

Editor:

Abstract

Mixture models are powerful statistical models used in many applications ranging from
density estimation to clustering and classification. When dealing with mixture models,
there are many issues that the experimenter should be aware of and needs to solve. The
MixEst toolbox is a powerful and user-friendly package for MATLAB that implements
several state-of-the-art approaches to address these problems. Additionally, MixEst gives
the possibility of using manifold optimization for fitting the density model, a feature specific
to this toolbox. MixEst simplifies using and integration of mixture models in statistical
models and applications. For developing mixture models of new densities, the user just
needs to provide a few functions for that statistical distribution and the toolbox takes care
of all the issues regarding mixture models. MixEst is available at visionlab.ut.ac.ir/mixest

Keywords: mixture models, mixtures of experts, manifold optimization, expectation-

maximization, stochastic optimization

1. Introduction

Mixture models are an integrated and fundamental component in many machine learn-
ing problems ranging from clustering to regression and classification ( McLachlan and Peel
2000). Estimating the parameters of mixture models is a challenging task due to the need

to solve the following issues in mixture modeling:

• Unboundedness of the likelihood: This problem occurs when one component gets a
small number of data points and its likelihood becomes infinite ( Ciuperca et ah . 20031 ).

• Local maxima: The log-likelihood objective function for esti mating the parame ters of
mixture models is non-concave and has many local maxima ( Ueda et ah . 2000 ).

• Correct number of comp onents: In many applica tions, it is needed to find the correct
number of components ( Khalili and Chen . 12007 ).

Addressing these issues for a mixture density when it is not available in common mixture
modeling toolboxes will cost a lot of time and effort for the experimenter. MixEst addresses

Hosseini and Mash’al

all these issues not only for already implemented densities, but also for densities that the
user may implement. By implementing densities, we mean implementing a few simple
functions which will be briefly discussed in section [3j

This toolbox provides a framework for applying manifold optimization for estimating the
parameters of mixture models. This is an important feature of this toolbox, because recent
empirical evidence shows that manifol d optim i zation can surpa ss expectation maximization
in the case of mixtures of Gaussians OHosseini and Sral . 120151). It also opens the door for
large-scale optimization by using stochastic optimization methods. Stochastic optimization
also allows solving the likelihood unboundedness problem mentioned above, without the
need of implementing a penalizing function for the parameters of the density.

While several libraries are available for working with mixture models, to the best of our
knowledge, none of them offers a modular and flexible framework that allows for fine-tuning
the model structure or can provide universal algorithms for estimating model parameters
solving all the problems listed above. A review of features available in some libraries can
be seen in Section [U

In the next section, we give a short overview of the toolbox and its features.

2. About the MixEst Toolbox

This toolbox offers methods for constructing and estimating mixtures for joint density and
conditional density modeling, therefore it is applicable to a wide variety of applications
like clustering, regression and classification through probabilistic model-based approach.
Each distribution in this toolbox is a structure containing a manifold structure represent¬
ing parameter space of the distribution along with several function handles implementing
density-specific functions like log-likelihood, sampling, etc. Distribution structures are con¬
structed by calling factory functions with some appropriate input arguments defining the
distribution. For example for constructing a mixture of one-dimensional Gaussians with 2
components, it will suffice to write the following commands in MATLAB:

Dravn = mvnfactory(1);

Dmix = mixturefactory(Dmvn, 2);

As an example of how to evoke a function handle, consider generating 1000 samples from
the previously defined mixture:

theta.D{1}.mu = 0; theta.D{1}.sigma = 1; % mean and variance of the 1st component
theta.D{2}.mu = 5; theta.D{2}.sigma = 2; % mean and variance of the 2nd component
theta.p = [0.8 0.2]; % weighting coefficients of components
data = Dmix.sample(theta, 1000);

Each distribution structure exposes a common interface that optimization algorithms in the
toolbox can use to estimate its parameters. In addition to the EM algorithm which is a
commonly implemented method in available libraries, our toolbox also makes optimization
on manifolds available featuring procedures like early-stopping and mini-batching to avoid
overfitting. For optimization on manifol ds, our too lbox depe nds on optimization procedures

of an excellent toolbox called Manopt ([Boumal et a l.. 120141 ). In addition to optimization

algorithms of Manopt like steepest descent, conjugate gradient and trust regions methods,
the user can also use our implementation of Riemmanian LBFGS method.

2

MixEst Toolbox for Mixture Models

3. Model Development

MixEst includes many joint and conditional distributions to model data ranging from con¬
tinuous to discrete and also directional. Some users, however, may want to apply the tools
developed in this toolbox for mixtures of a distribution not available in the toolbox yet. To
this end, the user needs to write a factory function that constructs a structure for the new
distribution.

Each distribution structure has a field named “M” determining the manifold of its
parameter space. For example for the case of multivariate Gaussian distribution, this is a
product manifold of a positive definite manifold and a Euclidean manifold:

% datadim is the function input argument determining the dimensionality of data

D.M = productmanifold(struct (' mu ' , muM, 'sigma', sigmaM));

The manifold of parameter space completely determines how parameter structure is given to
or is returned by different functions. The structure of parameters for multivariate Gaussian
would have two fields, a mean vector “mu” and a covariance matrix “sigma”.

To use the estimation tools of the toolbox, two main functions have to be implemented.
The weighted log-likelihood (wll) function and a function for computing the gradient of sum-
wll with respect to the distribution parameters. The syntax for calling the wll function is:

llvec = D.llvec(theta, data);

The input argument theta is a structure containing the input parameters of the corre¬
sponding distribution. The second input argument data can be either a data matrix or
a structure having several fields such as the data matrix and weights, which is interpreted
using the mxe_readdata function. The output argument llvec is a vector with entries
equal to wll for each datum (each column) in the data matrix.

The function to compute the gradient of sum-wll has the following syntax:

The input arguments are similar to the function llvec. The output argument llgrad is
a structure similar to the input argument theta returning the gradient of sum-wll with
respect to each parameter.

Some other (optional) functions that can be implemented for distributions are:
init: This is for initializing the estimator using the data.

estimatede fault: If the maximum wll has a structure that allows fast optimization
(or has a closed-form solution), this estimator can be implemented in this function.
When this function is not present, the Riemmanian optimization is called in the
maximization step of EM algorithm.

llgraddata: This function computes the gradient of wll with respect to the data.
It is required in some special cases such as when the distribution is used as the
radial component of an elliptically-contoured distribution or as the components in
independent component analysis.

3

Hosseini and Mash’al

11: This function is sum-wll (sum of the output vector of llvec function). Sometimes
it is faster to write this function differently than just calling llvec and summing up
its output vector.

Two other functions that can be used in the split-and-merge algorithms to avoid local
maxima of mixture models are kl (for computing KL-divergence) and entropy (for com¬
puting entropy). If the user wants to evoke a maximum-a-posteriori estimate, the functions
penalizerparam, penalizercost and penalizergrad need to be implemented.

4. Feature Comparison

To demonstrate the richness of features in MixEst, we are comparing its features with
several other well-known packages in Table [TJ Among many toolboxes available for mix¬
ture modeli ng, we select tho s e tha t are feat ure-rich and representativ e. These p ackages
are S klearn ( Pedreeosa et all l201lll . Mclust ( Fralev and Rafterv . 199(1 ). FlexMix ( Leisch .
2004 1. Bayes Net ( Murphvl . l200ll l and MixMod ( Biernacki et al. . 20061 ). We include Bayes

Net to demonstrate what a generic Bayesian graphical modeling toolbox can do. Sklearn is
a powerful machine learning toolbox containing many tools, among others tools specific for
mixture modeling. MixMod also provides bindings for Scilab and Matlab.

Table 1: Feature comparison of our toolbox and some other well-known packages. Different
rows correspond to the following specifications of different toolboxes: 1. Pro¬
gramming language; 2. Approaches for solving local minima problem (SM stands
for split-and-merge approach, IDMM for infinite dirichlet mixture models, HC
stands for initialization using hierarchical clustering); 3. Manifold optimization;
4. Bayesian approaches for inference (MAP stands for maximum-a-posteriori,
VB stands for variational Bayes); 5. Large-scale optimization (SEM stands for
stochastic EM, MB stands for mini-batching); 6. Having tools for model selec¬
tion; 7. Automatic model selection (CSM stands for competitive split-and-merge);
8. Ease of extensibility; 9. Having mixtures of experts; 10. Having mixtures of
classifiers; 11. Having mixtures of regressors;

MixEst

SKlearn

Mclust

FlexMix

Bayes Net

MixMod

# 1

Matlab

Python

R

R

Matlab

C++

# 2

SM

IDMM

HC

# 3

Yes

No

No

No

No

No

# 4

MAP

VB

MAP

MAP

SM

# 5

MB

SEM

# 6

Yes

No

Yes

No

No

Yes

# 7

CSM

IDMM

# 8

Easy

Easy

Medium

# 9

Yes

No

No

No

Yes

No

# 10

Yes

No

No

No

Yes

No

# n

Yes

No

No

Yes

Yes

No

4

MixEst Toolbox for Mixture Models

References

Christophe Biernacki, Gilles Celeux, Gerard Govaert, and Florent Langrognet. Model-based
cluster and discriminant analysis with the rnixmod software. Computational Statistics and
Data Analysis, 51(2):587-600, 2006.

Nicolas Boumal, Bamdev Mishra, P.-A. Absil, and Rodolphe Sepulchre. Manopt, a rnatlab
toolbox for optimization on manifolds. Journal of Machine Learning Research, 15:1455-
1459, 2014.

Gabriela Ciuperca, Andrea Ridolfi, and Jerome Idier. Penalized maximum likelihood es¬
timator for normal mixtures. Scandinavian Journal of Statistics, 30(l):45-59, March
2003.

Chris Fraley and Adrian E Raftery. Mclust: Software for model-based cluster analysis.
Journal of Classification, 16(2):297-306, 1999.

Reshad Hosseini and Suvrit Sra. Manifold optimization for Gaussian mixture models. arXiv
preprint arXiv:1506.07677, 06 2015.

Abbas Khalili and Jiahua Chen. Variable selection in finite mixture of regression models.
Journal of the American Statistical Association, 102(479):1025-1038, September 2007.

Friedrich Leisch. FlexMix: A general framework for finite mixture models and latent class
regression in R. Journal of Statistical Software, 11 (8): 1—18, 2004.

Geoffrey McLachlan and David Peel. Finite mixture models. John Wiley and Sons, New
Jersey, 2000.

Kevin P. Murphy. The Bayes Net toolbox for rnatlab. Computing Science and Statistics,
33:2001, 2001.

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blon-
del, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau,
M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in python.
Journal of Machine Learning Research, 12:2825-2830, 2011.

Naonori Ueda, Ryohei Nakano, Zoubin Ghahramani, and Geoffrey E. Hinton. Split and
merge EM algorithm for improving Gaussian mixture density estimates. The Journal of
VLSI Signal Processing, 26(1):133-140, 2000.

5

```